In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np 

This exercise consists of 3 parts. Finish the first part to get the mark of 3.0. The first 2 parts to get 4.0. Finish all parts to get 5.0.

Part 1: Linear layer¶

1.1) Let us start with a linear regression problem. Consider a linear function with a noise: $y = a*x+b + noise$.

We use this formula to generate $100$ random smaples.

In [2]:
### The number of samples
n = 100 
### parameters of the linear function
a = -2 
b = 3

1.2) Now, let us generate 100 samples and plot them.

In [3]:
### generate equally spaced x-values
x = np.linspace(-1, 1, n) 
### generate y-values (we use a numpy library so we can generate a vector of numbers - y - inline)
y = a * x + b + np.random.normal(scale=0.25, size=n)

plt.scatter(x, y)
Out[3]:
<matplotlib.collections.PathCollection at 0x16267ecb770>
No description has been provided for this image

1.3) As you may see, the samples are placed - more or less - along a single line. Now, our aim is to find the best parameters for a linear function so that such defined model describes the given data in the best possible way. For this reason, we will iteratively search the parameter space and thus update the model. Firstly, we need to define an error function. This function will inform how well (or bad) the instantiated model describes the data. For this reason, we use a mean square error function.

We define a mean square error function as:
$\dfrac{\sum\left(y_i - \widehat{y}_i \right)^2}{n} = MSE,$

where $y$ are the target (i.e., data values) and $\widehat{y}$ are the output (i.e., model's) values.

See the MSE (mean square error) function given below.

In [4]:
def mse(y_target, y_calc):
    return ((y_target - y_calc) ** 2).mean()

1.4) Run the below code for different parameters of the model. Which paramter values give the best (i.e., minimal) MSE?

In [6]:
a_2 = -2
b_2 = 4

y_calc = a_2 * x + b_2
print("MSE  =  " + str(mse(y, y_calc)))

plt.scatter(x, y, label="target")
plt.scatter(x, y_calc, label="calculated")
plt.legend()
MSE  =  1.1260964910234996
Out[6]:
<matplotlib.legend.Legend at 0x1626a1f56d0>
No description has been provided for this image
In [7]:
### For a = -2 and b = 3

1.5) We want to find the best possible model parameters automatically. For this reason, we use a gradient of a loss function. The gradient informs what is the direction of the fastest increase/decrease of a given function. We use this information to update both model parameters. This procedure will be performed iterativelly. In each iteration, the parameters a and b will be slightly modififed such that MSE will be reduced (i.e., improved).

Firstly, finish the below function. It should calculate a batch gradient of a loss function, i.e., MSE for each point separately (y_target, and y_calc are array, not just scalars, so output also should be array).

In [8]:
def mse_grad(y_target, y_calc):
    ### TODO
    return 2 * (y_calc - y_target) / len(y_target)

### TEST
print(mse_grad(y, y_calc))
[0.02235589 0.01356948 0.01701623 0.01726151 0.01200682 0.01893844
 0.01371649 0.02212246 0.01724588 0.01118972 0.01497529 0.01999775
 0.02166711 0.01905442 0.01802602 0.01887122 0.01990961 0.01209298
 0.02518662 0.0203895  0.02154965 0.026359   0.02243509 0.01807148
 0.01609106 0.02514202 0.02971261 0.02285466 0.02115794 0.02419784
 0.0288391  0.02472819 0.01380691 0.01759032 0.02317291 0.03312061
 0.0231661  0.02677565 0.02611943 0.02209432 0.01992915 0.01803999
 0.0160261  0.02018262 0.01729606 0.01251018 0.02036561 0.021589
 0.02166479 0.03242597 0.03463066 0.02166299 0.02048121 0.03070221
 0.01755885 0.0147668  0.02487427 0.02298442 0.02334226 0.02890569
 0.02110078 0.01809632 0.01807495 0.02251336 0.0140993  0.01518569
 0.0218093  0.01426817 0.02548795 0.02120752 0.01298135 0.02518039
 0.0151147  0.01375905 0.02250459 0.02695968 0.0294485  0.01924508
 0.01813737 0.01226658 0.01061563 0.01404619 0.02618542 0.01922405
 0.01953262 0.02099147 0.02545169 0.02701675 0.01385266 0.02201702
 0.015944   0.01933118 0.02529843 0.01548347 0.02191483 0.02411586
 0.02351191 0.02301757 0.02275225 0.01985627]

1.6) Fill the update function to calculate gradient of parameter $a$ and $b$ basing on a gradient of loss function (grad_y) and input vector (x). Then update the parameter $a$ and $b$ base on their gradients and learning rate (lr). To update parameters use batch gradient descent.

In [9]:
from statistics import mean


class LinearLayer:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b

    def update(self, x, grad_y, lr):
        grad_a = (grad_y * x).mean()
        grad_b = grad_y.mean()

        self.a -= lr * grad_a
        self.b -= lr * grad_b

1.7) Write Step function which calculates: y_calc output of the model base on input x, loss of the model, gradient of loss, and update the model parameters.

In [10]:
def Step(x, y, model, lr):
    y_calc = model(x)
    loss = mse(y, y_calc)
    grad_y = mse_grad(y, y_calc)
    model.update(x, grad_y, lr)
    return y_calc, loss

1.8) Fit the model for 100 epochs, with learning rate 0.05, and with initial value of parameters a = 1.1, and b = 2.

In [11]:
model = LinearLayer(1.1, 2)
In [12]:
lr = 0.05
In [13]:
epoch = 100
losses = []
for i in range(epoch):
    y_calc, loss = Step(x, y, model, lr)
    losses.append(loss)
In [14]:
plt.plot(losses)
Out[14]:
[<matplotlib.lines.Line2D at 0x1626a2a2990>]
No description has been provided for this image

Animation of the learning process

In [15]:
from matplotlib import animation, rc
rc('animation', html='jshtml')
In [16]:
model = LinearLayer(1.1, 2)
In [17]:
fig = plt.figure()
plt.scatter(x, y)
line, = plt.plot(x, y_calc, ".", c="orange")
plt.close()


def animate(i):
    y_calc, loss = Step(x, y, model, lr)
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, epoch), interval=20)
Out[17]:
No description has been provided for this image

1.9) There is an example it can be done in pytorch.

In [18]:
# Imports
import torch
import torch.nn as nn
In [19]:
# Convert numpy array to torch tensor, [:,None] add an additional dimension
xt = torch.FloatTensor(x[:, None])
yt = torch.FloatTensor(y[:, None])
In [20]:
def mse(y_target, y_calc):
    return ((y_target - y_calc) ** 2).mean()
In [21]:
class LinearLayer(nn.Module):
    def __init__(self, a, b):
        super(LinearLayer, self).__init__()  # initialize torch functionality
        # change a and b to float tensor, and next to parameters,
        # the main difference between tensor and parameter is that parameter keeps information about calculations,
        # which is used to calculate gradients
        self.a = nn.Parameter(torch.FloatTensor([a]).view(1, 1))
        self.b = nn.Parameter(torch.FloatTensor([b]))

    # forward function is similar to python __call__ but also contain torch functionality
    def forward(self, x):
        return  x @ self.a + self.b  # linear equation, @ means matrix multiplication for tensor

    def update(self, lr):
        with torch.no_grad():  # when we update parameter, we have to switch off gradient tracking
            self.a.sub_(lr * self.a.grad)  # inplace update of parameter a
            self.a.grad.zero_()  # clear gradient

            self.b.sub_(lr * self.b.grad)
            self.b.grad.zero_()
In [22]:
model =  LinearLayer(-1.1, 0.2)
In [23]:
def torchStep(x, y, model, lr):
    y_calc = model(x)  # calculate the output of our model
    loss = mse(y, y_calc)  # calculate the loss
    loss.backward()  # calculate all gradients
    model.update(lr)  # update parameters
    return loss, y_calc
In [24]:
loss, y_calc = torchStep(xt, yt, model, lr)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()


def animate(i):
    loss, y_calc = torchStep(xt, yt, model, lr)
    y_calc = y_calc.detach().cpu()  #
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
Out[24]:
No description has been provided for this image
In [25]:
# we can use optymalizer to update parameters base on their gradients
# the most simple is stochastic gradient descent (SGD)
def torchStep2(x, y, model, optim):
    optim.zero_grad()  # clear gradients
    y_calc = model(x)  # calculate output of model
    loss = mse(y, y_calc)  # calculate loss
    loss.backward()  # calculate all gradients
    optim.step()  # make a optymalizer step which update parameters
    return loss, y_calc
In [26]:
model = LinearLayer(-1.1, 0.2)
optim = torch.optim.SGD(model.parameters(), lr)
In [27]:
loss, y_calc = torchStep2(xt, yt, model, optim)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()


def animate(i):
    loss, y_calc = torchStep2(xt, yt, model, optim)
    y_calc = y_calc.detach().cpu()
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
Out[27]:
No description has been provided for this image

Part 2: Convolution layer¶

In [28]:
# input image
image = np.array(
    [
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
        [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
    ]
)
In [88]:
plt.imshow(image)
Out[88]:
<matplotlib.image.AxesImage at 0x2e431da3190>
No description has been provided for this image

2.1) Write a function which calculates a convolution on an input matrix (image) using kernel (mask) with shape 3x3 and bias. Do not use padding, so the output image should be in size: (input_wight -2) x (input_height -2).

In [29]:
def Convolution(image, kernel, bias):
    img_out = np.zeros((image.shape[0] - 2, image.shape[1] - 2))
    #TODO
    for i in range(image.shape[0]-2):
        for j in range(image.shape[1]-2):
            region = image[i:i+3, j:j+3]       # take 3x3 patch
            img_out[i, j] = np.sum(region*kernel) + bias
            
    return img_out
In [30]:
# kernel (mask) which is mean filter
kernel = np.ones((3, 3)) / 9
kernel
Out[30]:
array([[0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111]])
In [31]:
bias = -0.5
In [32]:
img_out = Convolution(image, kernel, bias)
In [33]:
plt.imshow(img_out)
Out[33]:
<matplotlib.image.AxesImage at 0x16277cf63c0>
No description has been provided for this image

2.2) Find out kernels (masks) which found horizontal and vertical lines. Pixels belonging to the line should be greater than zero and the others less than zero. Use size 3x3 masks.

Example
print(Convolution(np.array([[0,0,0,0,0],[0,0,0,0,0],[1,1,1,1,1],[0,0,0,0,0],[0,0,0,0,0]]), kernel_horizontal, -2))
[[-1. -1. -1.]
[ 1. 1. 1.]
[-1. -1. -1.]]

In [34]:
kernel_horizontal = np.array([
    [-1, -1, -1],
    [ 2,  2,  2],
    [-1, -1, -1]
])
In [35]:
img_horizontal = Convolution(image, kernel_horizontal, -2)
plt.imshow(img_horizontal)
Out[35]:
<matplotlib.image.AxesImage at 0x16277e74690>
No description has been provided for this image
In [36]:
kernel_vertical = np.array([
    [-1, 2, -1],
    [-1, 2, -1],
    [-1, 2, -1]
])
In [37]:
img_vertical = Convolution(image, kernel_vertical, -2)
plt.imshow(img_vertical)
Out[37]:
<matplotlib.image.AxesImage at 0x162780302d0>
No description has been provided for this image

2.3) Complete function to calculate ReLU.

In [38]:
def relu(x):
    return np.maximum(0, x)
    

2.4) Find bias values such that output images pixels have a value above 0 only if original pixel is a part of the horizontal/vertical line.

In [39]:
plt.imshow(relu(img_horizontal))
plt.show()
plt.imshow(relu(img_vertical))
No description has been provided for this image
Out[39]:
<matplotlib.image.AxesImage at 0x1627810fb10>
No description has been provided for this image

Part 3: Deep network¶

In [40]:
from sklearn.datasets import load_iris
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["variety"] = data.target
os.makedirs("data", exist_ok=True)
df.to_csv("data/iris.csv", index=False)
print("works")
works
In [41]:
# load iris dataset
df = pd.read_csv('data/iris.csv')
In [42]:
# n - number of elements in dataset
n = len(df)
In [43]:
# useful variables
feature_columns = [ 
                   "sepal length (cm)", 
                   "sepal width (cm)", 
                   "petal length (cm)", 
                   "petal width (cm)" 
                   ] 
target_column = "variety" 
class_number = 3 
feature_number = 4
In [44]:
# dictionaries use to map class name to number
#name_to_class = {0: "Setosa", 1: "Versicolor", 2: "Virginica"}
#class_to_name = {"Setosa": 0, "Versicolor": 1, "Virginica": 2}
In [45]:
# conversion of class name
#df[target_column] = df[target_column].apply(lambda x: class_to_name[x])
In [46]:
# take raw numpy data
x = df[feature_columns].values
y = df[target_column].values
In [47]:
# normalize data to make network input mean value equals 0 and standard deviation 1
x = (x - x.mean(0)) / x.std(0)
print(x.mean(0))
print(x.std(0))
[-4.73695157e-16 -7.81597009e-16 -4.26325641e-16 -4.73695157e-16]
[1. 1. 1. 1.]
In [48]:
# conversion numpy array to torch tensor
x = torch.FloatTensor(x)
y = torch.LongTensor(y)
In [49]:
# simple neural network with one hidden layer with hidden_nr neuron
# input_layer calculate some features  which are used by hidden_layer to calculate prediction
# between input_layer and hidden_layer there is relu  as a nonlinear activation function
# after hidden_layer there is sigmoid function because we want the network to return the result as a probability of each class in range [0,1]
class Net(nn.Module):
    def __init__(self, input_nr, hidden_nr, output_nr):
        super(Net, self).__init__()
        self.input_layer = nn.Linear(input_nr, hidden_nr)
        self.hidden_layer = nn.Linear(hidden_nr, output_nr)

    def forward(self, x):
        x = self.input_layer(x)
        x = torch.relu(x)
        x = self.hidden_layer(x)
        return torch.sigmoid(x)

Cross entropy loss is equal $- (y=0) * log(p_0) - (y=1) * log(p_1) - (y=2) * log(p_2)$ where $p_1, p_2,p_3$ are calculated probability of class 1,2,3; and y=0 means y is classified to class 0.

In [50]:
loss_func = nn.CrossEntropyLoss()
In [51]:
# accuracy means how many samples are classified correctly
def Accuracy(y_target, y_calc):
    prediction_class = y_calc.max(1)[1]
    number_of_correct = (prediction_class == y).float().sum()
    return number_of_correct / n
In [52]:
def Step(x, y, model, optim):
    optim.zero_grad()
    y_calc = model(x)
    loss = loss_func(y_calc, y)
    loss.backward()
    optim.step()
    acc = Accuracy(y, y_calc)
    return loss, y_calc, acc
In [53]:
# Train function train model for epoch step, and collect metrics (loss and accuracy)
def Train(x, y, model, optim, epoch):
    losses = []
    accuracies = []
    for i in range(epoch):
        loss, y_calc, acc = Step(x, y, model, optim)
        losses.append(loss.item())  
        accuracies.append(acc.item())
    return losses, accuracies
In [54]:
lr = 0.1
In [55]:
# create a model and optimalizer
hidden_nr = 5
model = Net(feature_number, hidden_nr, class_number)
optim = torch.optim.SGD(model.parameters(), lr)
In [56]:
epoch = 200
losses, accuracies = Train(x, y, model, optim, epoch)
In [57]:
plt.plot(losses)
plt.show()
plt.plot(accuracies) 
No description has been provided for this image
Out[57]:
[<matplotlib.lines.Line2D at 0x16211446490>]
No description has been provided for this image

Part 3:¶

3.1) Create a report of testing different values of learning rate, and number of neurons in hidden layer; Run every test 10 times with 200 epochs. Make a plot of mean of losses and accuracy of each value in the test case. Make a table of score after 200 epochs of learning which should contain best, worst, mean and standard deviation of loss and accuracy (you can use pandas describe function).

test case 1: 
learning rate:[ 1, 0.5, 0.1, 0.01, 0.001]
number of neuron in hidden layer: 10

test case 2: 
number of neuron in hidden layer: [1, 2, 5, 10, 20, 100]
learning rate: 0.1

In [58]:
#test case 1
#learning rate = [1, 0.5, 0.1, 0.01, 0.001]
#hidden_nr = 10
In [59]:
def RunExperiment(lr, hidden_nr, runs=10, epoch=200):
    all_losses = []
    all_accs = []

    for r in range(runs):
        model = Net(feature_number, hidden_nr, class_number)
        optim = torch.optim.SGD(model.parameters(), lr)

        losses, accs = Train(x, y, model, optim, epoch)

        all_losses.append(losses[-1])
        all_accs.append(accs[-1])

    return np.array(all_losses), np.array(all_accs)
In [60]:
lrs = [1, 0.5, 0.1, 0.01, 0.001]
hidden_nr = 10
results_lr = {} 
for lr in lrs: 
    losses, accs = RunExperiment(lr, hidden_nr) 
    results_lr[lr] = {"loss": losses, "acc": accs}
In [61]:
rows = []
for lr in lrs:
    losses = results_lr[lr]["loss"]
    accs = results_lr[lr]["acc"]
    rows.append([
        lr,
        losses.min(), losses.max(), losses.mean(), losses.std(),
        accs.min(), accs.max(), accs.mean(), accs.std()
    ])

df_lr = pd.DataFrame(rows, columns=[
    "lr",
    "loss_best", "loss_worst", "loss_mean", "loss_std",
    "acc_best", "acc_worst", "acc_mean", "acc_std"
])

df_lr
Out[61]:
lr loss_best loss_worst loss_mean loss_std acc_best acc_worst acc_mean acc_std
0 1.000 0.586839 0.594235 0.590324 0.002275 0.973333 0.973333 0.973333 0.000000
1 0.500 0.614771 0.666700 0.631494 0.015716 0.920000 0.960000 0.948667 0.013013
2 0.100 0.756409 0.816398 0.785878 0.019086 0.666667 0.853333 0.801333 0.063861
3 0.010 1.016545 1.078062 1.050361 0.020011 0.333333 0.746667 0.586667 0.153941
4 0.001 1.073412 1.146376 1.102994 0.021400 0.020000 0.660000 0.338667 0.175621
In [66]:
mean_losses = [results_lr[lr]["loss"].mean() for lr in lrs]
mean_accs = [results_lr[lr]["acc"].mean() for lr in lrs]

plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.plot(lrs, mean_losses, marker="o")
plt.title("Mean Loss vs LA")
plt.xlabel("LA(Learning Rate)")
plt.ylabel("Loss")

plt.subplot(1,2,2)
plt.plot(lrs, mean_accs, marker="o")
plt.title("Mean Accuracy vs LA")
plt.xlabel("LA(Learning Rate)")
plt.ylabel("Accuracy")

plt.show()
No description has been provided for this image
In [67]:
#test case 2
#hidden_nr = [1, 2, 5, 10, 20, 100]
#learning rate = 0.1
In [64]:
hidden_list = [1, 2, 5, 10, 20, 100]
lr = 0.1

results_hidden = {}

for h in hidden_list:
    losses, accs = RunExperiment(lr, h)
    results_hidden[h] = {"loss": losses, "acc": accs}
In [65]:
rows = []
for h in hidden_list:
    losses = results_hidden[h]["loss"]
    accs = results_hidden[h]["acc"]
    rows.append([
        h,
        losses.min(), losses.max(), losses.mean(), losses.std(),
        accs.min(), accs.max(), accs.mean(), accs.std()
    ])

df_hidden = pd.DataFrame(rows, columns=[
    "hidden_nr",
    "loss_best", "loss_worst", "loss_mean", "loss_std",
    "acc_best", "acc_worst", "acc_mean", "acc_std"
])
df_hidden
Out[65]:
hidden_nr loss_best loss_worst loss_mean loss_std acc_best acc_worst acc_mean acc_std
0 1 0.866428 1.103535 1.000002 0.077189 0.333333 0.746667 0.577333 0.162678
1 2 0.817784 0.971421 0.902371 0.049421 0.333333 0.820000 0.643333 0.135228
2 5 0.795410 0.859005 0.828844 0.024600 0.666667 0.826667 0.768667 0.045806
3 10 0.760184 0.811358 0.785148 0.012408 0.693333 0.846667 0.801333 0.044302
4 20 0.744421 0.785279 0.766755 0.013708 0.793333 0.900000 0.838667 0.027777
5 100 0.695304 0.719721 0.713231 0.007144 0.886667 0.913333 0.895333 0.008969
In [68]:
mean_losses = [results_hidden[h]["loss"].mean() for h in hidden_list]
mean_accs = [results_hidden[h]["acc"].mean() for h in hidden_list]

plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.plot(hidden_list, mean_losses, marker="o")
plt.title("Mean Loss vs Hidden Neurons")
plt.xlabel("Hidden Neurons")
plt.ylabel("Loss")

plt.subplot(1,2,2)
plt.plot(hidden_list, mean_accs, marker="o")
plt.title("Mean Accuracy vs Hidden Neurons")
plt.xlabel("Hidden Neurons")
plt.ylabel("Accuracy")

plt.show()
No description has been provided for this image